manipulation task
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Information Technology (0.67)
- Leisure & Entertainment (0.45)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Montana (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Benefiting from language flexibility and compositionality, humans naturally intend to use language to command an embodied agent for complex tasks such as navigation and object manipulation. In this work, we aim to fill the blank of the last mile of embodied agents--object manipulation by following human guidance, e.g., "move the red mug next to the box while keeping it upright." To this end, we introduce an Automatic Manipulation Solver (AMSolver) system and build a Vision-and-Language Manipulation benchmark (VLMbench) based on it, containing various language instructions on categorized robotic manipulation tasks. Specifically, modular rule-based task templates are created to automatically generate robot demonstrations with language instructions, consisting of diverse object shapes and appearances, action types, and motion constraints. We also develop a keypoint-based model 6D-CLIPort to deal with multi-view observations and language input and output a sequence of 6 degrees of freedom (DoF) actions. We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)